INTRODUCTION:

The occurrence and determination of amine have drawn a lot of interest in recent years as environmental issues and global environmental change are gamering an ever -increasing amount of attention worldwide. These amines are a major cause of social and sanitary issue and can be found in a variety of ambient situations, including air, water, soil, and food.

The toxic effects of nitro aromatics, which are dangerous substances, include skin hypersensitivity immunotoxicity, germ cell degeneration, inhibition of liver enzymes, and a speculative carcinogenicity. The lack of experimental data made it difficult to model the toxicity of nitroaromatic chemical. Because they are often utilized in industry, nitrobenzene’s (NBs) have a significant potential to pollute the environment. They have even been found in surface waters ¹.

There are models for structure-toxicity that include biology, chemistry and statistics. The intersection of these three topics has made it possible for structure-activity relationships to become a recognized specialty to toxicology.

In order to forecast toxicity for both new and current compounds, quantitative structure-activity relationships (QSARs) will be used more frequently over the coming ten years. The utilization of these methods to lessen or eliminate the use of animals in toxicological testing for the regulation of current chemicals will receive a lot of attention (e.g.in the REACH legislation) ². The publication of paper in 1962 ³ that demonstrated a relationship between biological activity and octanol-water partition coefficient is regarded as the official birthdate of QSAR⁴. To assess the relative toxicity of organic chemicals in terms (of tge logarithm of inverse) of the 50 % inhibitory growth concentration (IGC50) of Tetrahymena pyriformis, predicted simple linear regression QSAR models are proposed in the current paper.

The average is the best measurement of central tendency, the standard deviation is the best measurement of dispersion and the method of least squares (L% estimate) is the best method of regression if the values of the independent variables are known with precision and the errors in the observations of the dependent variable are distributed normally.

This is particularly when the dependent variable includes aberrant data or observations that are significantly disconnected from the totality of the data.

Regression using the least squares (LS) approach performs best when the distribution of errors is assumed to follow a normal distribution. Least Squares LS method cannot be used to estimate the regression parameters when the data do not meet the normality assumptions due to the existence of outliers and/or multicollinearity since the estimation error of the parameters can rise ⁵.

For situations when varied degrees of outliers are present in the observed data the limits of the traditional least squares diagnostics are directly tested against some robust estimating methods (non-parametric regression LAD) approach .⁶

It makes it possible to call into question several aspects of a problem to change the model complement as in my work.

The objectives in this study Choosing the best analysis method for simple linear model when one is confronted with a major problem in analysis of regression, namely the problem of not normal distributed or normal distributed but to disturb contains aberrant data. was conducted on 30 compounds (a mixed series of 21 (linear and branched -chain) alcohols and 9 normal aliphatic amines using the 50% inhibitory growth concentration (IGC50) of Tetrahymena pyriformis devised into a (20 training,10 test).

And based check it the normal distribution of errors on Test Anderson Darling (AD) (the normal law), a coefficient of symmetry (ou skeweness) and a coefficient of flatness (kurtosis) as well as the coefficient of skewness and the coefficient of kurtosis give the test of Jarque and Bera (the normal law) which resulted in an abnormal distribution of errors, which elimanates the use of the least square method. and with the method of Least Absolut Deviation. We will check normality distributed by the goodness-of-fit, the checking of the aberrant data using line plot. Finally, the value of the standard error and the coefficient of determination was relied on in determining the quality of adjustment for points after removed the abnormal (aberrant value)

MATERIALS AND METHODS:

DATA SET:

9 normal aliphatic amines and a set of 21 (linear and branched-chain) alcohols that were chosen to represent a range in chain length and branching were used to evaluate the toxicity.

The most studied common freshwater hymenotomy ciliate, Tetrahymena pyriformis, which measures roughly 50 µm in length and 30 µm in width, is inhibited by this toxicity, which is both nonionic and nonreactive.

The ciliates were cultivated in axenic culture and after 48h hours of incubation, the population density was determined spectrophotometrically as optically as optical density (absorbance) at 540 nm Schultz provided the experimental data set.

Descriptor Generation:

Each compound’s chemical structure was doodled on a computer using the Hyperchem program ⁹ and pre- optimized using MM+ molecular mechanics method (Polack-Ribiere algorithm)

By using the semi-empirical PM3 method at a constrained Hartree-Fock level with no configuration interaction and a gradient norm limit of 0.01 kcal.A^-1.mol^-1, the minimum energy conformation’s final geometries were determined is used as a halting point. Using the DRAGON software (version 5.3) ¹⁰ The generated served as the input for the production of (74),3D geometrical descriptors.

Geometrical descriptors that are defined from a molecule’s three-dimensional structure, which requires knowledge of the atoms ‘reactive positions in 3D, offer information and ability to distinguish between molecular structures and molecule conformations.

In the MOBYDIGS release of Todischini ¹¹ the sub-descriptor sets were chosen using genetic algorithm while using the calibration data to maximizing the prediction coefficient

Statistical analysis:

Simple linear regression:¶

In this study work, it was relied on mathematical computer software through one -variable equation or liar equation of the first order in their general form ^11-12:

F(x)= a + bx (1)

: Direction-finding factor (constant of regression):

(2)

: Standard deviation of y value

= (3)

: Standard deviation of X value

= (4)

: intersection point of the graph with the ordinal axis (coefficient of regression).

(5)

: Mean of Y value:

= (6)

: Mean of X value:

(7)

By the Least Squares LS method:

Oldest and simple method of linear regression. Under certain condition ^{11, the} most important condition will be seen under this study. Its principle is based on the problem of reducing the sum of differences squares between the expected and actual values.

Statistical analysis of the model based on the following factors: (

where, t is student value of et ddl

Enternal Training Criteria:

are determination coefficients ^11-12:

(8)

(Calculated according to equation 2) of the model ¹¹

(9)

S is the standard deviation ^11-12

(10)

is Adjusted ^11-12:

(11)

The P-value is the probability of obtaining a test statistic that is at least as extreme as the actual calculated value, if the null Hypothesis is true.

External validation criteria:

The external that is defined:

(12)

Where and are the dependent variable’s measured and predicted value (across the prediction set) and the dependent variable’s averaged value for the training set, and are the number of training set and the number in the external set.

The model needs to demonstrate:

Normality of residus:

For the tests of this to be valid the distribution used must be normal and not skew the interpretation of the typical forecast error.

Anderson Darling Test (AD)¹²:

The test of Anderson -Darling is alternative to the Kolmogorov-Smimov test, with the difference that it places more emphasis on the tails of the distribution.

A normal law as having a symmetry coefficient (ou skeweness) 0, and a flatness coefficient (kurtosis) 3 of respectively.

The Jarque and Bera Test ¹²:

(1984) which is based on the concepts of Skewness (asymmetry, which denotes the fact that the normal law is a symmetrical law) and kurtosis (flatness, which indicates to us the degree of skewness of the distribution tails), allow us to determine whether a statistical distribution with two degrees of freedom is normal.

By The Least Absolut Deviation (LAD) Method:

Since the method of least squares places heavy weights on the major terms of error, we turn to an alternative, estimator more robust, that minimize the absolute values and not the values with the square of the term of error. The absolute deviation (LAD) of the estimator, suggested by Gauss and Laplace belongs to the family of the quantile’s estimators. This method, does not put an excessive weight on very divergent observations, like least squares and thus produced more robust estimators compared to the aberrant values.

Enternal Validation Criteria:

The goodness -of -fit was evaluated by coefficient of determination ¹⁵:

R²=1- (13)

and standard deviation ¹⁵:

S= . (14) (15)

Being the actual value and being the computed value with the method stable-LAD. The techniques of cross validation were applied for the evaluation of the interval prediction (; bootstrap) and of the robustness ( Y-scrambling) of the model.

Validation crossed by “leave -one-out” (LOO) ¹⁵. consists in recomputing the model on (n-1) objects and using the obtained model to predict the value of the variable dependent on the isolated compound. The process is repeated for each N objects of the whole of test. The sum of the absolute values of the errors of prediction (indicated by acronym PRESS, for Predictive Residual LAD) is a measurement of the dispersion of the estimates. It is used to define the coefficient of prediction ( and the everage standard deviation of predictive (or EQMP) ¹⁵:

(16)

EQMP= (17)

is the sum of the absolute values total; indicate the response of the object estimated by using an obtained model without utilizing this object and med the value median of N observation; the summation runs on the whole of training compounds.

A value is regarded as satisfactory, a value is excellent. In fact, if a strong value of is a condition necessary of a possible high predictive capacity of a model, this condition alone is not sufficient ¶.

External validation criteria:

The equation (12) allows the calculation of ¹⁵:¶

=1- (18)¶

The data set randomly was divided into a training set (20 objects) used to develop the QSAR models and a validation set (10 objects),used only for statistical external validation the parameter is also useful. We calculate it according to:

¶= (19)¶

The bearing sum on the objects of the whole of validation

The bearing sum on the objects of the whole of validation ().

Applicability (DA):

The applicability was discussed the diagram of Williams (treated in detail in^7-8, representing the standardized residues of prediction:

R _i(lad)= (20)

According to the values of the levers hi. The equation (21) defines the lever of a compound in the original space of the independent variables .

(xi): H= ( 21)

Where are the vector line of the descriptors of compound (i) and the X (n) matrix of the model deduced from the values of the descriptors of the whole of calibration; the index T indicates the vector (or stramps it) transposed (E).

The breaking value of the level ( is fixed at (2p+1) /n. If hi , the probability of agreement between the values measured and predicted compound I is as high as that of composed of calibration. The compounds with hi reinforce the model when they belong to the whole of calibration; but we will have, doubtful values predicted without being inevitably aberrant, the residues are able to be low.

RESULTS AND DISCUSSION:

Study and numerical application:

More particularly we will test two methods of estimate for the vector of the parameters ().

Method of ordinary least squares, most known and the most used (Under certain condition if the fundamental distribution of the errors is normal, but if the errors are not really Gaussian and can include aberrant values it is preferable, we turn to an alternative robust, that minimizes the absolute values and not the square of error.

Least Squares:

We used in this work the method of least squares which ‘was largely’ studied. The data set randomly was divided into a training set (20 objects) used to develop the QSAR models and a validation set (10 objects), used only for statistical external validation

The definition of each descriptor is given Table 1:

Table 1: Definitions of descriptors used in the toxicity data prediction model.

Descriptors	The définition
H4p	H autocorrelation of lag 4/weighted by atomic polarizabilities.

The best models:

(-LogIGC50) (H4P): S=0.248, =97.39%, n=20 compounds.

The H4P descriptor (H Autocorrelation of Lag 4/weighted by atomic polarizabilities) encode information on structural fragments and therefore seem to be particularly suitable for describing differences in congeneric series of molecules ¹⁰.

A being the number of molecule atoms. Table 2 lists the Cas number -LogIGC50

Table 2: Toxicity values for the selected aliphatic alcohols and amines.

Numbers of compounds	Compound	-LogIGC50
1	Méthanol	--2.77
2	Ethanol	-2.41
3	1-propanol	-1.84
4	1-pentanol	-1.12
5	l1-hexanol	-0.47
6	1-heptanol	0.02
7	1-nonanol	0.77
8	1-decalnol	1.1
9	1-dodecanol	2.07
10	1-tridecanol	2.28
11	2-propanol	-1.99
12	2-methyl-1-butanol	-1.13
13	3-methyl-1-butanol	-1.13
14	3-methyl-2-butanol	-1.08
15	(tert) pentanol	-1.27
16	1-propylamine	-0.85
17	1-hexylamine	-0.34
18	1-heptylamine	0.1
19	1-octylamine	0.51
20	1-undecylamine	2.26
21	1-butanol*	-1.52
22	Lotanol*	0.5
23	1-undecanol*	1.87
24	2-pentanol*	-1.25
25	3-pentanol*	-1.33
26	(neo)pentanol*	-0.96
27	1-butylamine*	-0.7
28	1-anylamine*	-0.61
29	1-nonylamine*	1.59
30	1-decylamine*	1.95

(*): Validation set compounds

The diagnostic statistics joined together in Table 3 make it possible to make comparaisons and to draw Several conclusions ¹¹.

Table 3: Diagnostic Statistical sample.

Size	Mod els
1	H4p	97.39	96.69	96.24	93.91	97.24	0.00
			_SDEPS	_SDEC	F	s
		98.68	0.265	0.236	670.90	0.248

Values of attest the good fitting performances of the model which, moreover, is very highly significant (great value of the F).

The small difference between and (=0.70%) and the small difference between and (=0.45%) information about the robustness of the model is further highly significant (high value of the statistic Fisher F).

The close values of SDEC and SDEP mean that the ability of the internal prediction of model is not too dissimilar to his adjustment power.

External statistical validation attest to the good predictive ability of the compounds did not participate in the calculation model.

From the statistics results that models studied are the best when the standard deviation S and

The model based on one descriptor is for equation using the Minitab 16 software (Table 4):

Y=-2.49+9.65*H4p (22)

Table 4: least squares estimate for model.

Predictor	Coef	SE Coef	T	P
Constant	-2.4981	0.09934	-25.15	0.000
H4p	9.6565	0.3728	25.90	0.000

The tests of student make it possible to conclude with a risk of error from first species of that the parameters all are significant. Their estimates are =-2.498 and =9.656 we will check these assumptions graphically to check the normal distribution of errors:

Normality of the errors:

We observe from graphic the distribution of the residues is disturbed (Fig. 1) we rely on the following tests:

Anderson Darling Test (AD)=1.57 5 (fig 1 and 2) which explains the disturbance of distribution indicating that distribution is not compatibility with the normality law (not normal).

(Fig.2) has a negative asymmetry distribution (left asymmetry) from skewness coefficient sk=2.14 and kurtosis coefficient (ku=-5.75) also skewness coefficient with the kurtosis coefficient give Jarque and Bera Test which is considered among the normal distribution Test of errors but of the second degree of freedom (42.84=n

Through previous tests, it was found that the distribution of errors is abnormal.

Fig .1. Probability plots of errors.

Fig.2 plot of summary for residues (training, test).

The goodness- of -fit:

We note from the statement that all points on the line except for some points are abnormal points (aberrant value) y=f(x).

It is noted that the fundamental distribution of errors is not -normal by least squares can include aberrant values, it is preferable we turn to an alternative, more robust, that minimizes the absolute values of errors (Fig.3).

Fig. 3: Log IGC50 value vs Ha4p descriptor value.

Least Absolute Deviation Method:

We treat the same model by least absolute deviation (LAD) method because this method is non-parametric, its indications were extracted from the theoretical relations of parametric least squares method using its own .This work came after a long effort and for the first time these relationships were published (Fig.4).

Fig.4. Histogram of coefficient correlation.

The Least Absolute Deviation Method gave the true value of parameters which reverses logic of the model (Table 5).

Table 5: Diagnostic statistical for the Selected Models by Least absolut deviation (LAD) method.

Descriptors	N_{_Traing}	N_{_Test}	R²	Q²	Q_ext²
H4p	20	10	87.96	87.96	79.81
R__adj	EQMC	EQMP	EQMP	F	S
92.84	37.92	37.92	54.89	149.59	0.399

For the 20 compounds up, we used for the training (change of the model) are well correlated with the descriptor from where the great value of the coefficients of determination R2>50. Enjoy Our have model very good predictive capacities confirmed by the values of 50%. Whereas the equality enters R² and Q² inform about the robustness of the models which are ,moreover , very highly significant (high values of the statistics F of Fisher). Besides the similarity of EQMC and EQMP means that the capacities of prediction intern models are not too dissimilar to their capacities of adjustment. The value of informs us about the validity of the model and its capacity to predict values which were not used to generate it.

The model based on one descriptor is for equation using the

Calculation programs by MATLAB Software ¹⁵.

=-2.51+9.64*H4p (23)

The good positive correlation (r=0.99) in Table .6 and (Fig .4)

Indicates that when H4p increases, -Log IGC50 also tends to increase.

Table 6: Correlation matrix.

-Log IGC50

H4p

0.987

0.000

The tests of Student make it possible to conclude with a risk from error from first species from =0.05 that the parameters all are significant. Their estimates are =2.51 and =9.64 we will check this assumption graphically by distribution points for the relationship between values and descriptor H4p =F(x) (Table 7).

Table 7: Least Absolute Deviation estimates for model.¶

Predictor	Coef	SE Coef	T	P
Constant	-2.51	0.1593	-15.761	0.006
H4p	9.643	0.0415	232.625	0.000

Where we notice from (Fig.5) (20 training, 10 test) are distribution on the line directly, which indicates the complete agreement of 100% between the values and the descriptor.

Positive direct proportionality between H4p descriptor and (-Log IGC50).

We note (Fig.6) from the statement that all points (Training) between the interval lines (-2.2) except for a single point outside the interval.

Fig.5. -Log IGC50 calculated vs H4p descriptor by LAD method.

The analysis of the residues with the least absolute deviation (LAD) estimate (Fig .6) in the training set shows that the compound n° 16 (power toxicity compound) (1-Propylamine) highest residues and the observation (10) (1-tridecanol) is lever value () and in the whole of validation all the points between interval (-2.2) but it is three compounds: n° 3 (1-undecanol). n° 9 (1-nonylamine). compound n° 10 (1-decylamine) is level value ( =0.20)

Fig.6. Line plot of LAD model.

After two redoing the analysis for removing the aberrant compounds:

*(16) (power toxicity compound) (1-Propylamine) .

*20 (power toxicity compounds (1-undecylamine).

New model is Very good statistical using MATLAB Software ¹:

The standard deviation S= =0.0.303 (), R²=94.18%

Equation of the model using MATLAB Software:

=-2.51+9.71 *H4p (24)

We notice no change the coefficients of line after removed of aberrant value what translates the line is stable which expresses that the least absolute deviation (LAD) method does not sensitive to the presence of aberrant values thus we deduce that the least absolute deviation (LAD) method is a stable method and more robust.

There are no abnormal data compounds (aberrant) in Fig.7. Abnormal data compounds can have a healthy power on the consequences:

Training set: *10 (1-tridecanol).

Test set:

*3 (1-undecanol)

*10 (1-decylamine).

Has a major power

(

Fig.7. Line plot of the LAD model. (After removing Aberrant values).

Interpretation of the model:

The acute aquatic toxicity model predicts the concentration of substance that inhibits 50% of the growth (IGC50) of the population Tetrahymena pyriformis (Fig.7).

The H4p descriptor (H Autocorrelation of lag 4/weighted by atomic polarizabilities) encode information on structural fragments and therefore seem to be particularly suitable for describing in congeneric series of molecules ¹⁰.

One descriptor was able to model the Concentration (IGC50) of Tetrahymena Pyriformis. The value of coefficient by the Getaway descriptor H4p (6.71) in Equation (24) and (Fig.8) for correlation coefficient (r)=0.99) and determination coefficient (95.22%) show the regularity of the positive impact of this descriptor to the value of (-LogIGC50) by Least Absolut Deviation (LAD) method.

Fig.8. Histogram of coefficients of regression.

CONCLUSION:

The method of least squares (L% estimate) is the best method of regression if the distribution of error is normal in the works but if the reverse to change the method of treatment by strong method of alternative for the aberrant value this which we worked on this study

Among the GETAWAY descriptor H4P (H autocorrelation of lag 4/weighted by atomic polarizabilities). selected to model the inhibition of Tetrahymena pyriformis the growth by 30 compounds (21 aliphatic alcohols and 9 amines) devised into an (20 Training, and 10 Test) on a simple linear regression used the least squares method.

To examine the normality of the residues one validated the results by the 1.57 is not compatibility with the normal law (not normal). negative asymmetry distribution on the left from coefficient of skewness 2.14 , Jarque and Bera Test which presented like a test of normality of the residues has two degrees of freedom (42.84 deducing that the distribution of the residues abnormal.

By changing the method of treatment by the least absolute deviation (LAD) method and because this method is non-parametric, we fell on detection of the aberrant data and when their detection is very significant, we deal with this problem, by deducing the parameter from the LAD method from the estimate of least squares.

It is noticed that for least absolut deviation (LAD) regression: the robustness of least absolute deviation (Lad) regression is due to its sensitivity to the presence of aberrant values from changing the direction coefficients of the regression line, so we cancel the anomaly point (1-Ppropylamine) and do the analysis again to make sure that new aberrant value.

By withdrawing observation (16) (1-Propylamine compound), we find (LAD:=2.51 and =9.71 what shows that the estimated parameters are stabilized around the true values when aberrant values are removed.

In this work, the most significant effect of the aberrant data is the impact on the coefficient of determination Where we notice that it increased in its value by 6.22 % from the value 87.96% to 94.18%, after removing the aberrant compounds (abnormal) (It is interpreted as a better adjustment and that the coefficient of determination is positively affected by the absence of aberrant values) also the value of the standard deviation of the residue. Where it is less than the first value by 25% from the value 0.399 to the value 0.303 after removing the aberrant value.

The model is good and statistically strong.

CONFLITS OF INTEREST:

The authors declare that there are no conflicts of interest.

REFFERENCES:

1. Khadidja Bellifa, Sidi Mohamed Mekelleche.2012. QSAR study of the toxicity of nitrobenzenes to Tetrahymena pyriformis using quantum chemical descriptors. Arabian Journal of Chemistry, xxx, xxx–xxx

2. Nadia Ziani, Khadidja Amirat and Djelloul Messadi.2014 Inhibition of Tetrahymena pyriformis growth by Aliphatic Alcohols and Amines: a QSAR Study. Rev. Sci. Technol., Synthèse 29: 51-58.

3. Stefan M. Kohlbacher, Thierry Langer and Thomas Seidel.2021. QPHAR: quantitative pharmacophore activity relationship: method and validation. Journal of Cheminformatics 13, Article number: 57.

4. Sajjad Bordbar, Mostafa Alizadeh, Sayyed HojjatHashemi.2013. Effects of microstructure alteration on corrosion behavior of welded joint in API X70 pipeline steel. Materials and Design (Sciences direct) Elseiver.Vo 45,597-604.

5. Fabrizio Fratini, Patrizia Tettamanzi.2015. Corporate Governance and Performance: Evidence from Italian Companies. Open Journal of Business and Management. Vol.3 No.2.

6. Samah Anwar, Bahaa Khalil, Mohamed Seddik, Abdelhamid Eltahan, Aiman El Saadi.2022. A nonparametric statistical approach for the estimation of water quality characteristics in ungauged streams/watersheds. Journal of Hydrology. https://doi.org/10.1016/j.jhydrol.2022.128174.

7. Eriksson, L., Jaworska, J., Worth, A., Cronin, M., Mc Dowell, R.M., Gramatica, P. (2003). Methods for reliability, uncertainty assessment, and applicability evaluations of regression based and classification QSPRs. Environmental Health Perspective Journal, 111(10):1361-1375. https://doi.org/10.1289/ehp.5758.

8. Tropsha, A., Gramatica, P., Grombar, V.K. (2003). The importance of being Earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR and Combinatorial Science, 22(1): 69-76. https://doi.org/10.1002/qsar.200390007.

9. Hyperchem TM Release 6.03 for Windows, Molecular Modeling System, 2000.

10. Todeschini, R., Consonni, V., Dragon, P.M. (2006). Software for the Calculation of Molecular Descriptors. Release 5.3 for windows, Milano.

11. Todeschini, R., Ballabio, D., Consonni, V., Mauri, A., Pavan, M. (2009). MOBY DIGS software for multilinear regression analysis and variable subset selection by genetic algorithm. Release 1.1 for Windows, Milano.

12. MINITAB, Release 13.31, Statistical Software, 2000.

13. Estrada, E. and Molina, E. 2001. Novel Local (fragment-based) topological molecular descriptors for QSPR/QSAR and molecular desing.journal of Molecular graphics and modeling.20(1).54-64.doi 10.1016/S1093-3263(01)00100-0 PMID:11760003.

14. Goodarzi M, Jensens, R and Vander Heyden, y. 2012.QSRR Medeling for deverse drugs using diferent feature selection Methods coupled with linear and nonlinear regression. Journal chromatography. b. Analytical Technologies int the biomedical and life sciences. 494, doi:10.1016/j.jchromb.2012.01.012 PMID.22341354.

15. MATLAB Version 7.0.0 19920 (Release 14), The language of Technical Computiong. The Math Works. Inc. May 06(2004).

16. Zeeman M., Aver C.M., Clements R.G., Nabholtz J.V. and Boethling R.S., 1995. U.S. EPA Regulatory Perspectives on the use of QSAR for new and existing chemical evaluations SAR QSAR, Environmental. Research, Vol. 3(3),179-201.

17. Walker J.D., 2003. Applications of QSARs in toxicology: a US Government perspective, Journal of Molecular Structure - Theochem, Vol. 622(1-2), 167-184.

18. Bradbury S.P., Russon C.L., Ankley G.T., Schultz T.W. and Walker J.D., 2003. Overview of data and conceptual approaches for derivation of Quantitative Structure –Activity Relationships, for ecotoxicological effects of organic chemicals Environmental Toxicology and Chemistry, Vol. 22 (8), 1789-1798.

19. European Commission. White Paper on a strategy for a future Community Policy for Chemicals., 2001.http: // europa .eu.int / comm / enterprise / reach /.

20. Toussaint M.W., Shedd T.R., Van der Schalie W.H. and Leather G.R., 1995. A comparison of standard acute toxicity tests with rapid screening toxicity tests. Environmental Toxicology and Chemistry Vol. 14(5), 907-915.

21. Kubinyi H., 2002. From Narcosis to Hyperspace: The History Of QSAR, Quantitative Structure.-Activity Relationships., Vol. 21(4), 348-356.http:// e c b .j r c.i t / QSAR /.

22. Schultz T.W., Cronin M.T.D., Walker J.D. and Aptula A.O., 2003.Quantitative structure –activity relationships (QSARs) in toxicology: a historical perspective, Journal of Molecular Structure – Theochem, Vol.622(1-2), 1-22.

23. Posthumus R. and Slooff W., 2001. Implementation of QSARs in ecotoxicological risk assessments RIVM report. 601516003.

24. Dearden J.C., 2002. Prediction of Environmental Toxicity and Fate Using Quantitative Structure – Activity Relationschips (QSARs), Journal of Brazilian Chemical Society, Vol 13 (6), 754- 762.

25. Schultz T.W., Cronin M.T.D. and Netzeva T.I., 2003. The present status of QSAR in toxicology, Journal of Molecular Structure -Theochem. Vol. 622 (1- 2), 23-38.

26. Cronin M.T.D. and Dearden J.C., 1995. QSAR in toxicology. Prediction of Aquatic Toxicity, Quantitative Structure. -Activity Relationship Vol.14(1), 1-7.

27. Mannhold R. and van de Waterbeemdt H., 2001.Substructure and whole molecule approaches for calculating logP, Journal of Computer- Aided Molecular Design, Vol. 15(4), 337-354.

28. Mannhold R. and Rekker R.F., 2000. The hydrophobic fragmental constant approach for calculating logP in octanol/water and aliphatic hydrocarbon/water systems. Perspectives in Drug Discovery and Design, Vol.18(1), 1-18.

29. Benfenati E., Gini G., Piclin N., Roncaglioni A. and Vari M.R., 2003.Predicting log P of pesticides using different software, Chemosphere, Vol.53(9), 1155-1164.

30. Klopman G., Li J.K., Wang S. and Dimayuga M., 1994.Computer Automated log P calculations based on an extended group contribution approach, Journal of Chemical. Information Computer Sciences, Vol.34(4),752-781.

31. Kaiser K.L.E., 2003. The use of neural networks in QSARs for acute aquatic toxicological endpoints, Journal of.Molecular Structure. Theochem, Vol .622(1-2), 85-95.

32. Papa E., Villa F. and Gramatica P., 2005. Statistically Validated QSARs Based on Theoretical Descriptors , for Modeling Aquatic Toxicity of Organic Chemicals in Pemiphales promelas (Fathead Minnow ), Journal of Chemical Information and Modeling, Vol.45(5), 1256-1266.

33. Roy K. and Ghosh G., 2009.QSTR with extended topochemical atom (ETA) indices. 12. QSAR for the toxicity of diverse aromatic compounds to Tetrahymena pyriformis using chemometric tools. Chemosphere, Vol. 77(7), 999-1009.

34. Zhao Y.H., Zhang X.J., WEN Y., Sun F.T., Guo Z., Qin W.C., Qin H.W.,Xu J.L., Sheng L.X. and Abraham M.H., 2010.Toxicity of organic chemicals to Tetrahymena pyriformis: Effect of polarity and ionization on toxicity. Chemosphere, Vol. 79(1), 72-77.

35. Roy K. and Das R.N., 2010.QSTR with extended topochemical atom (ETA) indices.14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis. Journal of Hazardous Materials, Vol. 183(1-3), 913-922.

36. Bouaoune A., Lourici L., Haddag H. and Messadi D.,2012. Inhibition of Microbial Growth by anilines: A QSAR study, Journal of Environmental Science and Engineering., A1, Vol. 1(5A), 663-671.

37. Hill D.L., 1972. The Biochemistry and Physiology of Tetrahymena. Academic Press, New York and London,230p.

38. Schultz T.W., Lin D.T., Wilke T.S. and Arnold L.M.,1990. Quantitative structure-activity relationsh

39. Tiffany Machabert .2014 "Modèles en très grande dimension avec des outliers. Théorie, simulations, applications" paris.

40. Soner Çankaya, Samet Hasan Abacı.2015. A Comparative Study of Some Estimation Methods in Simple Linear Regression Model for Different Sample Sizes in Presence of Outliers. Turkish Journal of Agricultue Food Science and Technology. ISSN: 2148-127X.

41. Jiehan Zhu and Ping Jing.2010. The Analysis of Bootstrap Method in Linear Regression Effect. Journal of Mathematics Research Vol. 2, No. 4.

42. Yinbo Li and Gonzalo R. Arce.2004. AMaximum Likelihood Approach to Least Absolute Deviation Regression. EURASIP Journal on Applied Signal Processing. 12, 1762–1769.

43. Gonzalez, M.P, Teran, C., Saiz-Urra.I and Tcijcira.M.2008. Variable selection Methods in QSAR overview. currrent Topics in medicinal chemistry.8(18), 16061627.doi:102174/156802608786552PMID:19075770.

44. Roman Kaliszan, Tomasz Ba̧czek, Adam Buciński, Bogusław Buszewski, Małgorzata Sztupecka. 2003. Prediction of gradient retention from the solvent strength (LSS) model, quantitative structure-retention relationships (QSRR), and artificial neural networks (ANN). Journal of Separation Science. Volume 26, Issue 3-4.

45. Berlin, G.B. 1982 The Pyrazine; Wiley-Interscience: New York.

46. Pynnönen, Seppo and Timo Salmi (1994). A Report on Least Absolute Deviation Regression with Ordinary Linear Programming. Finnish Journal of Business Economics 43:1, 33-49.

47. Dodge, Y. et Valentin Rousson (2004). Analyses de regression appliquée. paris.

48. Faria, S. and Melfi, G. (2006). Lad regression and nonparametric methods for detecting outliers and leverage points. Student, 5 :265– 272.

49. Gabriela Ciuperca. (2009). Estimation robuste dans un modè paramétrique avec rupture. Bordeaux.

50. Gilbert Saporta. (2012). Régression robuste.

51. Ndèye Niang- Gilbert Saporta. (2014).Régression robuste Régression non-paramétrique.

52. Dr. Nadia H. AL – Noor and Asmaa A. Mohammad. 2013. Model of Robust Regression with Parametric and Nonparametric Methods. Journal of Mathematical Theory and Modeling Vol.3, No.5.

53. Dodge, Y. (2004). Statistique: Dictionnaire encyclopédique.

54. Dodge, Y. and Jureckova, J. (2000). Adaptive Regression. Springer-Verlag New York.

55. Nornadiah, Mohd Razali.Yab Bee,Yah .2011. Power Comparaisons of shapiro-wilk, Kolmogorov- smornov, lillieffors and Anderson-Darling tests, Journal of statistique Modelling and analytics .vol 2 No 1:21-33 .

Received on 20.09.2022 Modified on 11.02.2023

Asian J. Research Chem. 2023; 16(3):195-204.

DOI: 10.52711/0974-4150.2023.00031